Ayush Patel
At the RLadies Bangalore Meetup
28-Aug-2022
@ayushbipinpatel
ayush.ap58@gmail.com
@AyushBipinPatel
1. Basics of Rmarkdown
Advanvced RMarkdown knowledge is not required
Essentials1:
Choices Based on preference2:
Modifying an existing function to efficiently fit with a particular use case.
purrr::map() family functionsInstead of calling the wrapper function multiple times with different inputs, use map family functions to apply the wrapper function to desired sequence of input.
Three things to keep in mind:
This means you will essentially create your own function. Consider this use case.
I wan to show that as sample size increases the mean of sample gets closer to the mean of population.
Primary function to wrap. The rnorm() can be used to get a random sample from a population of desired mean and standard deviation
Desired Output. We want random numbers from a population of fixed mean and fixed SD, but with different number of observation
Required Inputs. We have three inputs. The fixed mean, fixed SD and the number of observations.
fixed_mean <- 8
fixed_sd <- 2
vec_norm <- rnorm(n = 10,mean = fixed_mean,sd = fixed_sd) # 3 inputs
vec_norm [1] 6.948382 10.505596 5.740777 7.999065 7.502900 8.771539 5.163323
[8] 7.480460 5.600220 4.794188
[1] 7.050645
wrap_rnorm <- function(pass_n_value){
rnorm(n = pass_n_value ,mean = fixed_mean,sd = fixed_sd)
}
vec_norm2 <- wrap_rnorm(pass_n_value = 20)
vec_norm2 [1] 9.630968 5.959133 9.633806 8.150890 5.915478 6.755224 10.735542
[8] 6.551124 7.989904 8.981636 7.811034 11.312163 8.916044 9.189349
[15] 9.701936 5.733621 6.905854 10.815325 11.259237 8.219931
[1] 8.50841
This function wrap_rnorm(), takes one value (say,n) — the number of observations. It then randomly generates n observations from a population of fixed mean and fixed SD.
[1] 12.618119 9.223980 8.585175 8.389362 7.264103 6.133763 6.518785
[8] 7.376522 7.864373 8.365424
[1] 6.387129 8.124797 7.891840 7.704465 10.576320 9.737834 8.876747
[8] 5.373789 7.548953 10.297046 6.845207 11.648763 10.734660 7.957484
[15] 8.517344 6.920973 5.107091 7.952919 7.081092 7.019311
[1] 6.560223 8.048634 12.202200 10.293537 7.065803 7.125337 7.987380
[8] 8.831583 3.588277 2.772305 8.413338 5.130236 3.962421 8.199020
[15] 9.129445 5.766644 5.534138 7.345632 9.517989 11.373459 10.362150
[22] 11.731038 9.502530 9.637268 5.814128 6.854545 8.256952 6.914727
[29] 8.316611 8.014058
But typing this multiple times, or copy pasting more than twice is not ideal.
This is where purrr::map() can help us
purrr::map() family - The Second Key [1] 9.277246 9.641011 8.223049 6.837769 6.293026 8.419360 4.221592 5.978006
[9] 8.291693 4.890595
[1] 10.662691 6.660318 8.136084 8.273103 4.673990 4.073667 6.608768
[8] 10.553002 5.648907 10.678810 9.810218 7.450201 12.246437 10.078057
[15] 9.826700 10.830182 6.744612 8.320719 7.824346 9.483533
[1] 7.544110 7.957464 10.360584 6.458247 7.957659 9.231404 7.645065
[8] 4.849479 3.698372 9.564612 9.379014 5.769380 8.279075 10.821331
[15] 9.012934 8.599845 5.697840 7.351044 13.253735 7.386874 7.218736
[22] 8.535865 10.349491 8.745315 9.859614 6.145464 8.662354 8.967474
[29] 8.149633 7.770852
[[1]]
[1] 10.816358 8.185007 7.427629 9.701427 9.765678 5.455677 5.022867
[8] 8.137811 6.294769 12.163303
[[2]]
[1] 8.874205 6.458478 8.397012 7.675455 8.285075 9.248115 6.769382
[8] 8.274423 8.479237 7.037915 7.463629 9.427760 10.834785 9.288717
[15] 7.906149 9.209483 5.114173 6.765709 6.464328 5.945079
[[3]]
[1] 6.499760 9.197877 8.827299 6.373612 6.428216 9.472122 7.703281 6.264655
[9] 6.074504 6.371418 7.609697 7.992328 8.772184 5.284319 8.668518 6.659431
[17] 8.246815 8.134401 7.870927 8.885915 5.479374 9.391986 8.595907 9.444719
[25] 8.359750 9.802525 9.855422 8.697819 6.230392 6.811421
What if the function takes more than one argument??
[1] 99.79911 94.50820 101.86479 97.39179 100.61663 103.46590 92.10241
[8] 98.48696 100.98396 109.25042
[1] -58.716636 -15.598206 8.496584 -31.628217 -16.963207
[1] -75.895778 -59.308389 -28.836774 -48.361434 -45.958544 -39.201747
[7] -35.239147 -48.500105 -38.126520 -25.058001 -50.352329 -74.676758
[13] -50.054358 -8.709084 -64.967992
[[1]]
[1] 100.19341 98.48849 101.88900 93.10251 98.71095 101.92998 108.96508
[8] 98.74750 99.33220 97.75242
[[2]]
[1] -85.03220 68.35520 -41.19661 38.62527 40.88699
[[3]]
[1] -85.491147 -70.807020 -75.356104 -76.513720 -63.172638 -65.915728
[7] -11.590671 -28.690708 -19.321058 5.820102 -53.210837 -10.895302
[13] -36.859627 -46.937393 -57.539149
We now move on to parameterised reports.
.rmd files ?Answering these three questions will provide a strong intuition about Parameterised Reports.
IT IS JUST A FANCY NAME FOR VALUES.
I think of parameters as values that a .rmd assumes before it is rendered/knitted.
These are declared or stated in the yaml of the .rmd file.
A single .rmd file can have one or more parameters
---
title: My Document
output: html_document
params:
year: 2018
region: Europe
printcode: TRUE
date: !r Sys.Date()
---
These parameters declared in yaml can then be accessed/used anywhere in the .rmd file.
params$year
params$region
Change params in yaml as needed and use knit button
---
title: My Document
output: html_document
params:
year: 2018 # change values here and press knit button
region: Europe # change values here and press knit button
printcode: TRUE # change values here and press knit button
date: !r Sys.Date() # change values here and press knit button
---
All the files should be contained in a project.
I prefer this structure — this is opinionated. Feel free to deviate from this.
Directory tree
From here on forward, I shall complement the slides with an example I have created for generating parameterised reports with Rmarkdown and Purrr.
The github repository for this can be accessed here.
The final output and explanation can be accessed here. This is written in a manner where it can be used as a stand alone resource.
I have village amenities data from the Indian census 2011, for Gujarat state.
I want a report at district level. This means every district will have its own report. This is also where we decide on the parameters that we shall need.
The report should have the following:
Create a separate R script which will clean, wrangle and make all necessary changes to raw data.(script_clean_raw_data.R in the scripts folder)
Save prepared data in appropriate location.(save in the data_prepared folder.)
All the reports will be generated from the structure defined by this .rmd file.
Declare all parameters that were decided in this .rmd file.
It is in this file the analyses flow will be carried out.
I suggest to write this .rmd file keeping in mind some values that the params in this file can take. This makes it easier to implement the analyses flow.
This .rmd file can be stored in the scripts folder.
Create a R script for functionally generating multiple parameterised reports.
In this script create a wrapper function around the rmarkdown: render() function.
Once this function is created. Create a vectors/lists, one for each param, that will contain the sequence of values to be passed to a given parameter.
Use the appropriate {purrr} function, if there are two or more params pmap is the way to go, apply the wrapper function over the vectors/lists of param inputs. This will generate all your reports and save those in the location specified.
Same process can be followed for creating parameterised reprots with .qmd files as well.
With one major difference. Instead of rmarkdown::remder() we need to use quarto::quarto_render(). Note that quarto_render does not have the output_dir argument, therefore all reports from the .qmd files are generated in the same directory as the .qmd file.
Chapter 15 in Rmarkdown the definitive guide
{purrr}
Chapter 19 in R4DS
Tom Mock for get started with quarto
Slidecraft by Emil Hvitfeldt